Newest 'machine-learning natural-language-processing' Questions

0votes

3answers

70views

Using LLM/AI tools to identify entity types

I am working with a data that has a list of organization names, but the "type" of the organization is not given. What I mean by type is that I know that organizations within my list can fall ...

Kuantew

109

asked Feb 19 at 21:40

0votes

0answers

21views

Finding Contextual Synonyms that are not necessarily Grammatical Synonyms

I'm trying to learn if there is a way to utilize ML to find out a list of contextual synonyms for a word in a sentence. I know of some obvious ones where you mask the word and have some model predict ...

sharkeater123

33

asked Jan 4 at 20:04

-1votes

1answer

64views

Where is machine learning leading to? [closed]

I was looking at the progress of the more popular LLMs in the last few years and wondering whether in the near future what will happen is that through the use of semi-exhaustive methods only those ...

GEP

109

asked Dec 7, 2024 at 19:41

2votes

2answers

211views

Is Llama3 fully open-source, including tokenizer, transformers, and other components needed to build a custom LLM?

I'm trying to understand whether Llama 3 (or other open source models) is fully open-source. Specifically, I would like to know: Is the source code for Llama 3 (including the tokenizer, transformers, ...

mlibre

175

asked Sep 26, 2024 at 6:01

5votes

2answers

297views

Are the model implementations in Hugging Face’s transformers library created by the original model authors or by Hugging Face?

I've been exploring the implementation of models like Llama in Hugging Face’s transformers library, for example: Hugging Face's Llama model implementation. I’m ...

mlibre

175

asked Sep 26, 2024 at 5:27

4votes

1answer

168views

In the Manifold Hypothesis applied to LLMs, are text sequences points or paths on the manifold?

The Manifold Hypothesis makes a ton of sense to me for images. Images are points in high dimensional space, where each dimension corresponds to the intensity value of a single pixel. For example, we ...

Stephen W.

43

asked Sep 1, 2024 at 17:19

3votes

1answer

1kviews

why we use learnable positional encoding instead of Sinusoidal positional encoding

In the original paper of transformers they using positional encoding to capture the position of each word in the sentence and for calculate that it using sin and cos ,like shom in the image. In Bert ...

LAILA EL OUEDEGHYRY

51

asked Apr 9, 2024 at 8:59

1vote

2answers

889views

How the Q,K,V be calculated in multi-head attention

I want to understand the transformer architecture, so I start with self attention and I understand their mechanism, but when I pass to the multi-head attention I find some difficulties like how ...

LAILA EL OUEDEGHYRY

51

asked Apr 2, 2024 at 8:58

4votes

2answers

1kviews

Why different noise in GAN generate different images?

I understand that noise $z$ serves as the input to the generator. Noise $z$ is essentially a vector of random numbers, typically from Gaussian distribution with chosen size of like $100$. However, I ...

user77925

asked Mar 12, 2024 at 18:45

1vote

1answer

140views

Fine tuning or just feature extraction or both using Roberta?

I'm reading a program that use the pre-trained Roberta model (roberta-base). The code first extracts word embeddings from each caption in the batch, using the last hidden state of the Roberta model. ...

user77925

asked Mar 8, 2024 at 19:54

2votes

2answers

1kviews

What technique is used for training Large Language Models like GPT?

I'm learning about GenAI, such as GPT (Generative Pretrained Transformer), and I'm particularly interested in understanding the training techniques used for these models. Deep learning, generally, can ...

Exploring

371

asked Feb 28, 2024 at 3:06

0votes

0answers

60views

Understanding the concepts of embedding in Roberta architecture?

I'm reading the implementation file of Roberta architecture, specifically in the RobertaEmbedding class, this class has the comment: ...

user79662

asked Jan 22, 2024 at 10:27

0votes

1answer

222views

how can I interpret attention weights matrix? Are they reliable?

I've fine-tuned two different models (Bert and Roberta) on a dataset for a binary classification task and I'm comparing the sentences where the models predict wrong. I decided to use attention weights ...

Shayan

21

asked Jan 12, 2024 at 23:11

0votes

1answer

134views

Using naive bayesian vs. transformer-based architecture model for human-annotated data?

I have a reddit dataset with thousands of online posts over the economy and inflation. We have used human-annotation on 60% of posts to determine whether users blame the following entities over the ...

maldini1990

101

asked Dec 10, 2023 at 14:04

2votes

1answer

136views

NLP "small" model to improve "big" model

When training the model for NLP is it important to get rid of data which has "bad semantic" for learning process? My plan is to create a "small model" which can decide whether data ...

Milkmaid

135

asked Nov 16, 2023 at 8:47

Stack Exchange Network

All Questions

Using LLM/AI tools to identify entity types

Finding Contextual Synonyms that are not necessarily Grammatical Synonyms

Where is machine learning leading to? [closed]

Is Llama3 fully open-source, including tokenizer, transformers, and other components needed to build a custom LLM?

Are the model implementations in Hugging Face’s transformers library created by the original model authors or by Hugging Face?

In the Manifold Hypothesis applied to LLMs, are text sequences points or paths on the manifold?

why we use learnable positional encoding instead of Sinusoidal positional encoding

How the Q,K,V be calculated in multi-head attention

Why different noise in GAN generate different images?

Fine tuning or just feature extraction or both using Roberta?

What technique is used for training Large Language Models like GPT?

Understanding the concepts of embedding in Roberta architecture?

how can I interpret attention weights matrix? Are they reliable?

Using naive bayesian vs. transformer-based architecture model for human-annotated data?

NLP "small" model to improve "big" model

Hot Network Questions

All Questions

Related Tags